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ABSTRACT : 

When the sizes of the training sets are small, classification 
in a subspace of the original data space may give rise to a smaller 
probability of error than the classification in the data space itself. 
This is because the gain in the accuracy of estimation of the likeli- 
hood functions used in classification in the lower dimensional space 
(subspace) offsets the loss of information associated with dimension- 
ality reduction (feature extraction). To test this conjecture, a 
computer simulation was performed. A number of pseudo- random 
training and data vectors were generated from two tour-dimensional 
Gaussian classes. An algorithm previously described (ICSA Technical 
Report #275-025-022, EE Technical Report #7520) was used to create 
an optimal one-dimensional feature space on which to project the 
data. When the sizes of the training sets were small, classification 
of the data in the optimal one-dimensional space was found to yield 
lower error rates than the one in the original four-dimensional space. 
Specifically, depending on the sizes of the training sets, the improve- 
ment ranged from 11% to \%. 
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I. Introduction : 

In real pattern recognition systems, the situation often arises 
that the classifier as well as the feature extractor must be designed 
with a limited number of training samples. As a result, in certain 
cases, the estimates of the class conditional statistics which are 
used to determine the classification stiategy are poor. 

When Gaussian statistics are assumed and the dimension of the 
raw data is n, then the n elements of the mean vector x^ for 
class as well as the [n x (n + 1) j / 2 independent elements 

of the covariance matrix for class are estimated using the 

formulas : 



R* - jAn £ (X, - xl) (X, - xJ) . (2) 

J *l' s , 

where is the training set representing the class and N. 

is the total number of training vectors x^ in . 

It is well known that the uncertainties of these estimates 
decrease mor.otonically with increasing NL and decreasing n [2]. 
The number of training samples necessary to obtain a non-singular 
estimate for the covariance matrix must be greater than or equal 
to n + 1 . However, in order to obtain a really good estimate of 
the covariance as well as the mean often several times this number 
of training samples are needed [2]. 
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When the ratio / n (J = 1 , . . . , M) (where M * total 
number of classes), tends to infinity, classification results obtained 
using all available features are superior to those results obtained 
using any transformation of the original space into a lower dimen- 
sional space. However, when Gaussian pattern classes are present 
and the ratio N. / n(J«l,...,M) is small, the feature extrac- 
tion method presented in [l] can be of great value. When these 
conditions are satisfied, one can sometimes obtain better classifica- 
tion performance by using the optimal single linear Gaussian feature 
than by using all n features. This is so because when the dimen- 
sionality of the data is reduced to unity, the estimate of the mean 
in the reduced space is a one- dimensional estimate rather than an 
n-dimensional estimate. Similarly, he estimate of the class 
conditional covariance is merely the one-dimensional variance estimate 
rather than the n x n dimensional covariance matrix given by (2). 
Essentially, then the ratio Nj / m (where m is the dimension of 
the space in which classification is made) is increased with the 
reduction of dimensionality from n to m = 1 . Hence the uncer- 
tainties in the mean and covariance estimates are reduced. This 
gain in accuracy in estimation may offset the loss of information 
resulting from the dimensionality reduction. Thus, in certain cases, 
results from classification obtained using our optimal single linear 
Gaussian feature can give rise to a lower probability of error than 
those obtained using all available features. Numerical results from 
the computer simulation described in the following section do indeed 
attest to this fact. 
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H. Numerical Results : 

To verify the preceding argument, the following test procedure 
was conducted. A number of pseudo- random data vectors from two 
four-dimensional Gaussian classes were generated. N of these 
samples from each class were used to compose a training set from 
which the class conditional statistics given by (1) and (2) were 
obtained. Using these estimates, the optimal single linear Gaussian 
feature was found. The remainder of the pseudo- random data 
vectors were transformed using the optimal single linear Gaussian 
feature and were classified in the reduced space. Classification on 
these same samples were also performed in the untransformed space. 
The classification performances, which is the ratio of the number of 
samples classified properly to the total number of classifications 
made, were computed for each method and are listed in Table I. 

These results clearly show that one can improve classification using 
the optimal single linear Gaussian feature for small values of N / n . 
At higher values of N / n , one may even obtain comparable 
classification performance. 



Number of 
Training Samples 
(N) 

Classification Performance 

Optimal Single 
Linear Gaussian 
Feature 

All 4 Available 
Features 

5 

. 590 

. 485 

10 

. 610 

.600 

20 

.610 

.600 

30 

. 605 

.630 

40 

. 590 

.630 

50 

.610 

.640 


Table I Classification Performance for 
Varying Sizes of Training Sets 
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III. Conclusions : 

In the test case presented, it is readily noted that for low 
values of N / n , the classification performance obtained using 
the optimal single linear Gaussian feature exceeds that obtained 
using all available features. Similar results were found by 
classifying using subsets of all available features in [3]. Thus for 
low values of N / n , one realizes certain advantages from using 
this approach. First, a reduction in computer storage and mathe- 
matical computation is achieved. More importantly, one may 
improve the performance of the classifier. 

The effect of having a small number of vectors in the training 
set on other algorithms ought to be explored. 
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